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Abstract 

Benjamini and Hochberg (1995) proposed the false discovery rate (FDR) 
as an alternative to the family-wise error rate in multiple testing prob- 
lems, and proposed a procedure to control the FDR. For discrete data this 
procedure may be highly conservative. We investigate alternative, more 
powerful, procedures that exploit the discreteness of the tests and have 
FDR levels closer in magnitude to the desired nominal level. Moreover, 
we develop a novel step-down procedure that dominates the step-down 
procedure of Benjamini and Liu (1999) for discrete data. We consider 
an application to pharmacovigilance spontaneous reporting systems, that 
serve for early detection of adverse reactions of marketed drugs. 

1 Introduction 

In many modern applications the data is discrete, and hundreds, or even thou- 
sands, of hypotheses are simultaneously tested. In order to control for false- 
positives, a multiple testing procedure may be applied that either controls the 
probability of at least one false positive (family-wise error) or the expected pro- 
portion of true null hypothes es rejected out of all rejected hy potheses, known as 
the false discovery rate (FDR. lBeniamini and Hochbergil995 ). Resea rchers have 



been a ctive in developing methodologies for controlling the FDR, see I Benjamini 



(I2OIOI) :or an overview. However, little has been written about controlling the 



FDR when the data is discrete. 

In many modern applications, it may be more appropriate to apply a multi- 
ple testing procedure that controls the false discovery rate (FDR) over a family- 
wise error controlling procedure. One such application with discrete data comes 
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from pharmacovigilance systems for marketed medicines, that collect and moni- 
tor spontaneous reports of suspected adverse events from health-care providers. 
In order to detect new adverse drug reactions after the drug marketing ap- 
proval, multiple hypotheses of no association between drugs and adverse events 
are simultaneously tested periodically in the pharmacovigilene databases. Phar- 
macovigilance and all drug safety issues are relevant for everyone whose life is 
touched in any way by medical interventions. In the analysis of pharmacovig- 
ilance systems, the aim is not to substitute the expertise of the pharmacovigi- 
lance experts but rather to draw attention to unexpected associations by acting 
as hypothesis generators. The associations thus found are then destined to be 
further investigated, so it is possible to tolerate few false discoveries as long as 
they are a small fraction of the discoveries. 

Another modern application area is genomics research. High-throughput 
next generation sequencing (HT-NGS) technologies output a list of sequence 
reads. These sequences are mapped to their genomics locations. The data for 
statistical analysis is tag counts. In order to test for enriched regions, the null 
distribution of counts is used. Discrete data is also encountered in genome wide 
association studies, where minor allele frequency of diseased and non-diseased 
individuals are compared simultaneously in hundreds of thousands of single- 
nucleotide polymorphisms. 

The procedure introduced bv iBeniamini and Hochberel (1995), henceforth 
referred to as the BH procedure, is the original and still very popular multiple 
testing procedure for controlling the FDR. If the null distribution of the p- 
values is uniform and the p- values are independent, then the FDR of the BH 
procedure at level q is where m and mo are the number of hypotheses 

and t he number of true null hypotheses respectively ( Beniamini and Hochber3 . 
19951) . However, for discrete test statistics the null distribution of the p-values 



is stochastically larger than the uniform, and therefore the FDR of the BH 
procedure may be much smaller than This is so because the expression 

for t he FDR of the BH procedure i nvolves sums with the terms PrHi{Pi < 
^q) (jBeniamini and Yekutieli . 2001 ). where Pi is the p- value of a true null 
hypotheses Hi, k = 1, . . . ,m. If the null distribution of the p- value is uniform, 
then PrHi{Pi < ^q) = ^g. But for discrete data, PrHi{Pi < ^q) may be 
less than ^q and, the greater the gaps between PrHi{Pi < ^q) and ^g, the 
smaller the true FDR level of the BH procedure. Thus, the BH procedure may 
be conservative for discrete data, in the sense that its actual FDR level may 
be smaller than ^g. Note that this conservatism does not go away with an 
increase in the number of hypotheses, nor with modifications of the original 
BH procedure that can provide higher power by incorporating an estimate of 
the number of null hy potheses (such as, for e.g., the adaptive procedure in 
Beniamini et al.l (|2006l) ). 



Few other approaches that take the discreteness into accoun t for F DR control 
have been suggested in the literature. iKulinskava and LewinI (|2009l) suggested 
an FDR controlling procedure using randomized p-values (from randomized 
tests) to account for the discreteness of the null distribution, thus guaranteeing 
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that the p-values are uniformly distributed under the null, and therefore that 
the FDR is controlled exactly at the desired level when the p-values are indepen- 
dent. Interpretation of results is not stra i ghtfor ward in this case though, due to 



the randomness of the p-values. iGilberti (|2005l ) proposed a two step FDR con- 



trolling procedure for discrete data. First, remove the null hypotheses with test 
statistics that are unable to reach a certain level of significance. Second, apply 
the BH procedure to the remaining hypotheses. This approach does not exploit 
the di screteness of the test statistics that are not removed in the first step. iHevsd 
(|201l[) suggested a discrete BH procedure, that exploits more fully the discrete 
null distributions of the test statistics, and demonstrated in simulations that it 
has pow er advantage o ver the procedure of [Gilbert (j2005l) . Howe ver, the proce- 
dure bv lGilbertI (l2005l) co ntrols the FDR while the procedure in iHevsd (|201l[ ) 
may be anti-conservative. lAhmed et al. I (I2OIOI) used midP-values in conjunc- 
tion with an FDR controlling procedure for the analysis of pharmacovigilance 
systems, and provided simulation results that suggest that it is an improvement 
over using the p-values. 

Our first aim in this work is to study the properties of the BH procedure 
using midP- values. We will prove that the actual FDR level of the BH procedure 
based on midP-values is closer to the nominal level than the BH procedure 
based on p-values. We will also derive an upper bound on the FDR level of 
the BH proc edure ba s ed on midP- values. A straightforward modification of the 
procedure in lGilbertI (|2005l ) will be to apply this procedure using midP- values. 
We will com pare arid cont rast this resulting new procedure with the procedure 
suggested bv lHevsd (|2nilh . 

The BH procedure is a step- up procedures. Benjamini and Liul ( 1999[ ) sug- 
gested a step-down procedure for FDR control, called henceforth the BL pro- 
cedure. Our second aim in this work is to study discrete analogues to the BL 
procedure. We develop a novel discrete BL procedure and prove that it con- 
trols the FDR at the nominal level. We will compare and contrast this novel 
procedure with proven FDR control, to the new procedure that results from 
removing first the null hypotheses with test statistics that are unable to reach 
the nominal level of significance, and then applies the BL procedure on midP 
values. 

The paper is organized as follows. Section 2 introduces the relevant proce- 
dures and discusses theoretical properties of these procedures. Section [3] applies 
the procedure on an example from a pharmacovigilance database. In the exam- 
ple, more suspect drugs can indeed be discovered with the procedures that take 
discreteness into account. Section 4 evaluates the proposed procedures by sim- 
ulation, and section 5 concludes with final remarks. An R package discreteMTP 
to perform the step-up and the step-down discrete multiple testing variates of 
BH and BL, respectively, is available from the first author web page or GRAN. 
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Table 1: Table relating treatment to adverse event, for 10 studies. Among the 
treated, the occurrences and nonoccurrences were Xn and X12 respectively; 
among the controls, the occurrences and nonoccurrences were X21 and X22 
respectively; the p-value was computed from a one-sided Fisher's exact test for 
2x2 tables; the midP-value is the average of the p- value with the next smallest 
p-value that could possibly be observed in Fisher's exact test with the same 
fixed margins as observed. 





Xn 


X12 


X21 


X22 


p- value 


midP-value 


1 


1.000 


15.000 


13.000 


3.000 


0.000 


0.000 


2 


2.000 


36.000 


12.000 


20.000 


0.001 


0.000 


3 


1.000 


14.000 


7.000 


6.000 


0.009 


0.005 


4 


10.000 


30.000 


12.000 


8.000 


0.009 


0.006 


5 


0.000 


20.000 


5.000 


18.000 


0.035 


0.017 


6 


2.000 


5.000 


7.000 


2.000 


0.072 


0.039 


7 


8.000 


16.000 


15.000 


12.000 


0.095 


0.062 


8 


3.000 


11.000 


7.000 


15.000 


0.389 


0.267 


9 


5.000 


12.000 


5.000 


10.000 


0.555 


0.411 


10 


7.000 


14.000 


5.000 


20.000 


0.914 


0.834 



2 FDR controlling procedures for discrete data 



Consider a family of m hypotheses Hi , . . . , H„i with corresponding p- values 
Pi, ■ ■ ■ ,Pm- Let Iq be the indices of the true null hypotheses, and mo = |/o| 
be the number of null hypotheses. Sorting these p-values, we get < . . . < 



Pif. 



< 



< 



P(m) with corresponding null hypotheses 



(I)'- 



(m) 



In this section we illustrate the different FDR controlling procedures using a 
small data example summarized in Table [U that relates treatment to_aii adverse 
event for 10 studies. This is a subset of the 41 studies considered in Efron (Il996h . 



2.1 The BH procedure on mzrfP- values 

The midP- value was suggested by Lancaster! ( 1961 ) to replace the p- value in 



discrete tests. The p- values are made smaller by averaging the actual observed 
p-value with the next smaller p-value that could possibly be observed. The 
probability of observing a midP- value less than a should better approximate 
the nominal level a, because the distribution of the midP-valne under the null 
hypothesis is closer to uniform than is t h e P- value. For motiv ation and theoret- 
ical ju stific ations, s ee Laiicaster (1961), Routledg3 |l994).B crrv and Armitage 
^M) and iFellowsl (|20ln^ . However, unlike the p- value, a test based on the 
rm^P- value may exceed the nominal significance level a. lAgresti and GottardI 
(|2007l ) write "..we believe it is more sensible to use a method for which the ac- 
tual error rate is closer to the nominal error rate than happens with traditional 
exact inference. Inference based on the midP-vahie is a simple way to achieve 
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Table 2: The adjusted p-values from the muhiple testing procedures on the p- 
values or midP- values of Table[T] The columns from left to right are the adjusted 
p-values from (1) the BH procedure on the p-values; (2) the BH procedure on 
the midP-vaiues; (3) the DBH procedure; (4) the BL procedure on p-values; (5) 
the BL procedure on midP-values; (6) the DBL procedure. 





BH- 


midP+BR 


DBH 


BL 


midP+BL 


DBL 




adjusted 


-adjusted 


-adjusted 


-adjusted 


-adjusted 


-adjusted 


1 


0.000 


0.000 


0.000 


0.000 


0.000 


0.000 


2 


0.004 


0.002 


0.001 


0.007 


0.004 


0.002 


3 


0.023 


0.014 


0.012 


0.054 


0.029 


0.023 


4 


0.023 


0.014 


0.012 


0.054 


0.029 


0.023 


5 


0.070 


0.035 


0.038 


0.115 


0.060 


0.077 


6 


0.119 


0.064 


0.062 


0.155 


0.089 


0.078 


7 


0.135 


0.089 


0.082 


0.155 


0.091 


0.107 


8 


0.486 


0.333 


0.351 


0.231 


0.182 


0.200 


9 


0.617 


0.457 


0.442 


0.231 


0.182 


0.200 


10 


0.914 


0.834 


0.846 


0.231 


0.182 


0.200 



this goal." Similarly, when testing simultaneously multiple hypotheses using p- 
value based multiple testing procedures, we argue that it is more sensible to use 
m?(iP-values in place of p-values. 

Since the midP-values are smaller than the p-values, using midP-values 
instead of p-values in an FDR controlling procedure will lead to at least as 
many rejections as the FDR controlling procedure based on p-values. For the 
small data example summarized in Table (TJ the adjusted p-values from a BH 
procedure on p-values and on midP- values are summarized in Table [21 From 
Table [2] we see that applying the BH procedure at level q = Q.l on p-values and 
on TOzdP-values led to 5 and 7 rejections respectively. 

Since the distribution of midP is closer to the uniform than the distribution 
of the p-value, the true FDR level of FDR controlling procedures will be closer 
to the nominal level than the true FDR on the original procedures that use 
p-values. We formalize this statement for the BH procedure on mzrfP-values. 

Proposition 2.1. Let midFDR and origFDR be the true FDR levels of the 
BH procedure using the BH procedure at level q on midP -values and on p-values 
respectively. Then if the p-values are independent, 

\—q ~ midFDR\ < —q ~ origFDR. 
m m 

Note that the true levels of the BH procedure at level q on midP-values 
and on p-values vary with the probability distributions of the p-values, so a 
technically more precise statement of the inequality in Proposition 12 . II that does 
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not suppress the dependence of the true FDR levels on the true data generating 
distributions is 

l^g - midFDRiCiPi), . . . , £(P„)| < — g - omgFDR{C{Pi), . . .,C{Pm), 
m m 

where C{Pi) denotes the true probability distribution of p- value Pi. 

From the above proposition, it follows that applying the BH procedure on 
rnidP- values will result in an FDR level at most 2^q — origFDR. A tighter 
upper bound on the midFDR, that is calculable from the known discrete null 
distributions, is derived in the following proposition. 

Proposition 2.2. Let e; = J^&Xke{i,...,m} "^''"i'^'"^^^-'"^/™) ^ If the p-values are 
independent, midFDR < X^ie/o 

It follows that midFDR < ^ X^I^Li ^ij ^'^"^ ^his upper bound can be com- 
puted from the known null distributions. Note that is bounded above by 
2, since for midP- values PrHi{midPi < x) < 2x — PrHi{Pi < x). Therefore, 
midFDR < 2q, but this upper bound may be far from tight. Depending on the 
exact null distributions, the upper bound -^J^TLi^i "i^Y be much closer to q 
than to 2q. 

Incorp orating the sim ple two-step combination of the Tarone and BH pro- 
cedures in Gilbert (I2OO5I) on midP-values results in the following procedure. 



Procedure 2.1. The Tarone+midP adjusted BH procedure at level q is the 
following two-step procedure: 

1. Compute the minimum achievable significance level for hypothesis Hi, call 
it qi*. For each k — 1, . . . ,m, let mik) he the number of hypotheses for 
which qi* < c ■ q/k (where c > I is a predefined constant), and let K he 
the smallest value of k such that m{k) < k. Let Ik C {1, • . ■ ,m} he the 
set of indices satisfying qi* < q/K. Ik contains the set of m{K) indices 
of hypotheses for which the minimum achievable significance level is below 
the Bonferroni threshold when testing m{K) hypotheses. 

2. Apply the BH procedure on the midP -values of the family of hypotheses 
with indices in the set Ik • 

The two-step procedure will typically have higher power than the BH pro- 
cedure on midP-values, with the gain in power generally increasing with m — 
m{K). Propositions 12.11 and 12.21 hold for the Tarone-|-midP adjusted BH pro- 
cedure, with m replaced by m{K), since the first step in Procedure 12.11 selects 
the subset of hypotheses with indices Ik for testing solely based on the null 
distributions of the p-values, without looking at the realized p-values. 

2.2 A discrete step-up procedure 



The BH-adjusted p-values iBeniamini et al.l (j2006 ) are p^^-^"* — 



The BH procedure at level q is equivalent to rejecting all hypotheses with BH- 
adjusted p-value < q. Motivated by this formulation of the BH procedure. 
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Hevsg (|201ll ) suggested the following discrete analogue, henceforth called the 
DBH procedure. The DBH procedure adjusted p-values are 



DBHadj ■ T.T=lPrHAPl <P(i)) 

— nun . 

i>j i 



The DBH procedure at level q is equivalent to rejecting all hypotheses with 
DBH-adjusted p- value < q. 

The gain from using the DBH procedure over the BH procedure comes from 
the fact that Pth, {Pi < P{i)) < P{i)- If hypothesis Hi cannot achieve a p- value 
below p( j) then Pthi {Pi < P(i) ) = and the dimensionality of the multiple 
comparisons problem is reduced. If hypothesis Hi can achieve a p-value below 

then Prn, {Pi < P(i) ) < P(i) and a smaller quantity adds to p^^^"'^"' . On the 
other hand, if all the null distributions are identical then Pr^j^ {Pi < P[i)) — P[i) 
and there is no gain in using the DBH procedure over the original BH procedure. 
Thus follows the proposition below, 

Proposition 2.3. The DBH procedure rejects at least as many null hypotheses 
as the BH procedure. However, if all null distributions are the same then the 
DBH procedure rejects exactly the same null hypotheses as the BH procedure. 

For the small example in Table [I] Table [2] shows that adjusted p-values 
from the DBH procedure are smaller than the adjusted p-values from the BH 
procedure on p- values, but not necessarily smaller than the adjusted p- values 
from the BH procedure on midP-values. 

An example where the DBH procedure does not control the FDR 

Note that if R hypotheses are rejected by the DBH procedure, then ^'=i p^Hi^Pi<P(r) 
q, but this does not guarantee that the FDR is controlled, since the FDR is 

P'{ ^"^inax(fl7)''^'^ ) ^^"^ ^^^^ quantity may be larger than q, as the following ex- 
ample demonstrates. Let Pi be a p-value with atom at 0.02, 0.045 and 1, and 
let P2 be a p-value independent of Pi with atoms at 0.03, 0.055 and 1. For 
m = niQ = 2, the FDR is equal to 

P{V > 0) = Pr{Pi = 0.02] + Pr{P2 = 0.03) 

~Pr{Pi = 0.02) X Pr{P2 = 0.03) + Pr{Pi = 0.045) x Pr{P2 = 0.055) 
= 0.02 + 0.03 - 0.02 X 0.03 + 0.025 x 0.025 = 0.050025. 

2.3 A discrete step-down procedure 

The BL procedure (|Beniamini and is a step-down multiple com- 



parisons procedure for FDR control, so it compares the smallest p-value with 
the first critical value, and proceeds to compare the second smallest p-value 
with the second critical value only if the smallest p-value was below its criti- 
cal value; as soon as a p-value is above its critical value, no further compar- 
isons are made. The critical values in the BL procedure are Si = 1 — [1 — 
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min(l, '"-'+1 , i = l,...,m. The procedure find the smahest p-value 

among aU those satisfying < Sj, c all it pm->.^^, and re j ect th e R null hy- 
potheses whose p- value is at most P(r)- Beniamini and LiijI ( 19991 ) proved that 



this procedure contro l s the FDR at the nominal level q for independent test 



statistics, and ISarkaij (|2002i ) demonstrated that the FDR is controlled also if 



the te st statistics are positive dependent in some sense. iBeniamini and Liu 



([l99i) show that the BL procedure neither dominates nor is dominated by the 



BH procedure. 

We consider the following new discrete analogue to the BL procedure, hence- 
forth called the DBL procedure. The DBL procedure will use the following 
critical values 

m 

S^= max{max{z : (1 - - PrH^, (P, < z))) < q}, S,-i}, So= 0. 

The correspondence between the BL and DBL procedures can best be seen 
by expressing their respective adjusted p-values. The BL adjusted p-values are 

The discrete BL adjusted p- values are 

p---^ . nrax{I^i-l±l(l-n(l-P.,^,, (P, < P,)))),^; V^}, pgf^''^- = 0. 

Since Pr^j^, (P,- < p(i)) < p(i), it follows that p^,^'^"* < P^)^'"'^- 

Proposition 2.4. For independent test statistics, the DBL procedure controls 
the FDR at the nominal level. 

See Appendix ICl for the proof. 

The proposition implies that for independent test statistics, the DBL proce- 
dure should always be preferred over the BL procedure with discrete data since 
it will be uniformly more powerful than the BL procedure and has guaranteed 
FDR control. For the small example in Table [1] Table [2] shows that adjusted 
p-values from the DBL procedure are smaller than the adjusted p- values from 
the BL procedure on p-values, but not necessarily smaller than the adjusted 
p-values from the BL procedure on midP-values. 

Relaxation of the independence assumption in Proposition 12.41 For 

FDR control of the DBL procedure, it is enough to assume that the joint 
distribution of statistics from true nulls is independent of the joint distribu- 
tion of statistics from false nulls and that the Sidak inequality is satisfied on 
the test statistics from true nulls. There is no restrictio n on the joint de- 
pendency of statistics from false nulls. Sidak's inequality ( Ge et al. , 2003[) is 
Pr(Pi > pi, . . . ,P^ > p.m) > n™ iPr(P, > k). 



8 



3 An Example 



The Medicines and Healthcare products Regulatory Agency (MHRA, |http : //www . mhra . gov . uk/ ) 
in the United Kingdom operate post-marketing surveillance for reporting, in- 
vestigating and monitoring of adverse drug reactions to medicines and inci- 
dents with medical devices. Their database contains complete listings of all 
suspected adverse drug reactions or side effects, which have been reported by 
healthcare professionals and patients to the MHRA via the Yellow Card Scheme 
( http : //yellowcard . mhra . gov . uk/| . The Yellow Card Scheme receives more 
than 20,000 reports of possible side effects each year. Half a million reports were 
received in the scheme's first 40 years. In 2007, more than 500 defects related 
to medicines were reported to the MHRA, resulting in the issue of more than 
30 Drug Alert. All reports made to the MHRA on suspected reactions to drugs 
are listed in the Drug Analysis Prints. We use data from the Drug Analysis 
Prints for illustration. 

To investigate the association between reports of amnesia and suspected 
drugs, we extracted the number of reported cases of amnesia as well as the total 
number of adverse events reports for each of the 2466 drug in the database. From 
the total of 686911 adverse events reports, 2051 contained cases of amnesia. For 
each drug, the association between the drug and amnesia was tested by a one- 
sided Fisher's exact test. Specifically, for drug i the 2x2 contingency table for 
testing for association with amnesia was 





Amnesia 


Not Amnesia 


Drug i 


An{i) 




Other drugs 


2051 -Aii(i) 


686911 - 2051 - Ai2(i) 



where Aii{i) are the number of Amnesia cases reported for drug i, and 
Aii{i) + Ai2{i) are the number of cases reported to have adverse events for drug 
i. Table [3] shows the adjusted p- values from the following 6 procedures: the BH 
procedure on the p- values and on the midP- values respectively, the DBH pro- 
cedure , the BL procedure on the p- values and on the midP- values respectively, 
and the DBL procedure. The adjusted p-values using the discrete variants of 
the BH or BL procedure were indeed smaller than the original procedures, and 
provide more discoveries at a predefined FDR level. Specifically, at the nom- 
inal level oi q = 0.05, the number of drugs discovered to be associated with 
amnesia by the original BH procedure on p-values was 23, and there were two 
additional discoveries using the BH procedure on midP-values. Applying the 
DBH procedure provided a total of 27 discoveries. 

Should we indeed pay attention to the additional discoveries provided by 
the discrete step-up procedures over the BH procedure? For one such discovery. 
Bupropion, the answer is clearly positive. The adjusted p-values for the drug 
Bupropion by the BH procedure on p- values, on midP- values, and by the DBH 
procedure were, respectively, 0.053, 0.0392, and 0.0134. Therefore, at level 
q — 0.05 this association would not be discovered by the original BH procedure 
but would be discovered by the two discrete analogues. Evidence that Bupropion 
is associated with memory disorders in the Frenc h pharmacovigilance database 
Bupropion was reported bv lChavant et al.l (|201l[ ). 
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Table 3: The 27 smallest adjusted p- values from the multiple testing procedures 
on the p- values or midP- values of the adverse event data in Section [31 The 
columns from left to right are the adjusted p-values from (1) the BH procedure 
on the p-values; (2) the BH procedure on the midP- values; (3) the DBH proce- 
dure; (4) the BL procedure on p-values; (5) the BL procedure on midP-values; 
(6) the DBL procedure. 



BH BH midPV DBH BL BL mid PV DBL 



BUPROPION 


0, 


.0503 


0, 


.0392 


0, 


.0134 


0, 


.6909 


0, 


.6003 


0, 


.2681 


GABAPENTIN 


0, 


.0000 


0, 


.0000 


0, 


.0000 


0, 


.0000 


0, 


.0000 


0, 


.0000 


INDOMETHACIN 


0, 


.0000 


0, 


.0000 


0, 


.0000 


0, 


.0000 


0, 


.0000 


0, 


.0000 


LACOSAMIDE 


0, 


.0331 


0, 


.0176 


0, 


.0085 


0, 


.5254 


0, 


.3270 


0, 


.1734 


LEVETIRACETAM 


0, 


.0054 


0, 


.0033 


0, 


.0014 


0, 


.0958 


0, 


.0592 


0, 


.0258 


LITHIUM 


0, 


.0001 


0, 


.0001 


0, 


.0000 


0, 


.0020 


0, 


.0012 


0, 


.0004 


LORAZEPAM 


0, 


.0000 


0, 


.0000 


0, 


.0000 


0, 


.0000 


0, 


.0000 


0, 


.0000 


MEFLOQUINE 


0, 


.0000 


0, 


.0000 


0, 


.0000 


0, 


.0000 


0, 


.0000 


0, 


.0000 


MIDAZOLAM 


0, 


.0023 


0, 


.0013 


0, 


.0006 


0, 


.0353 


0, 


.0202 


0, 


.0095 


OXCARBAZEPINE 


0, 


.1377 


0, 


.0752 


0, 


.0349 


0, 


.9645 


0, 


.8569 


0, 


.5942 


PREGABALIN 


0, 


.0000 


0, 


.0000 


0, 


.0000 


0, 


.0000 


0, 


.0000 


0, 


.0000 


RIMONABANT 


0, 


.0000 


0, 


.0000 


0, 


.0000 


0, 


.0000 


0, 


.0000 


0, 


.0000 


SERTRALINE 


0, 


.0695 


0, 


.0490 


0, 


.0186 


0, 


.8131 


0, 


.6961 


0, 


.3621 


SIMVASTATIN 


0, 


.0000 


0, 


.0000 


0, 


.0000 


0, 


.0000 


0, 


.0000 


0, 


.0000 


STRONTIUM RANELATE 


0, 


.0066 


0, 


.0038 


0, 


.0019 


0, 


.1222 


0, 


.0728 


0, 


.0360 


TEMAZEPAM 


0, 


.0001 


0, 


.0001 


0, 


.0000 


0, 


.0012 


0, 


.0007 


0, 


.0003 


TOPIRAMATE 


0, 


.0046 


0, 


.0028 


0, 


.0012 


0, 


.0739 


0, 


.0465 


0, 


.0206 


TRIAZOLAM 


0, 


.0000 


0, 


.0000 


0, 


.0000 


0, 


.0000 


0, 


.0000 


0, 


.0000 


VARENICLINE 


0, 


.0000 


0, 


.0000 


0, 


.0000 


0, 


.0000 


0, 


.0000 


0, 


.0000 


VIGABATRIN 


0, 


.0050 


0, 


.0030 


0, 


.0014 


0, 


.0854 


0, 


.0526 


0, 


.0236 


ZOLPIDEM 


0, 


.0000 


0, 


.0000 


0, 


.0000 


0, 


.0000 


0, 


.0000 


0, 


.0000 


ZOPICLONE 


0, 


.0000 


0, 


.0000 


0, 


.0000 


0, 


.0000 


0, 


.0000 


0, 


.0000 


CITALOPRAM 


0, 


.0010 


0, 


.0007 


0, 


.0002 


0, 


.0141 


0, 


.0096 


0, 


.0032 


DEXAMPHETAMINE 


0, 


.0073 


0, 


.0038 


0, 


.0020 


0, 


.1398 


0, 


.0753 


0, 


.0406 


ETHANOL 


0, 


.1274 


0, 


.0695 


0, 


.0325 


0, 


.9527 


0, 


.8243 


0, 


.5562 


FLUOXETINE 


0, 


.0236 


0, 


.0176 


0, 


.0062 


0, 


.3995 


0, 


.3251 


0, 


.1250 


PAROXETINE 


0, 


.0000 


0, 


.0000 


0, 


.0000 


0, 


.0000 


0, 


.0000 


0, 


.0000 
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Note that due to the limitations of the way data are gath ered into phar- 
macoyigilance systems, neither our analysis, nor the analysis in IChavant et al 



1 20111 ). can confirm that Bupropion affects memory, but the analysis does sug- 
gest that further investigation into the effect of this drug should be pursued. 

In fact, for all the drugs discovered by our analysis, further investigations 
are necessary for establishing whether the suspect drugs indeed cause amnesia. 
These further investigations may be costly, and therefore it is important not to 
launch investigations into too many false leads. The discrete procedures have 
higher power to discover true associations between drug and amnesia than the 
original procedures, while being careful to guarantee that only a small fraction 
of the discovered associations may be false positives, thus these procedures suit 
well the purpose of pharmacovigilance systems. 



4 A simulation study 



The power and FDR level of the various procedures are co mpared i n simu lations. 
We examined simulation settings similar to the settings in lGilbert (I2OO5I) . except 
that we examine one-sided tests rather than the two-sided tests considered in 



Gilbert! (120051 ). 



4.1 Structure of the simulation 

A vector of m = 100 binary responses is observed for each of N individu- 
als in each group, and the goal is to test simultaneously the m hypotheses 
Hi : pii — P2i,'i — l,...,m, where pij is the success probability for the ith 
binary response in group j (i € {l,...,m} and j G {1,2}). For fraction /i 
and /2 of the m hypotheses, the null was true with success probability 0.01 and 
0.10 respectively. The remaining null hypotheses were false with success prob- 
abilities 0.10 and 0.30. As /i increases, procedures that consider first a Tarone 
adjustment, as detailed in step 1 of Procedure l2.11 and then an FDR controlling 
procedure, have more power than just the FDR controlling procedure. For each 
data set an unadjusted p- value from Fisher's exact test is computed for each of 
the m positions at which there is at least one success in the pooled data set. For 
independent test statistics, the data for each of the m contingency tables was in- 
dependently generated. For dependent test statistics the data was generated as 
follows. For each of the N subjects in group s, s G {1, 2}, a multivariate normal 
outcome vector X^i ^ N,n{fis,'^) was first generated, and Ys,; = I[^si < 0]. 
The parameter vector fig was chosen to reflect probabilities of 0.1 or 0.3. The 
off-diagonals in the covariance S received the value p £ {0,0.01,0.5,0.9}. 

The 4 new procedures considered are the DBH procedure, the Tarone-f midP 
adjusted BH procedure, the DBL procedure, and the Tarone-f midP adjusted 
BL procedure. They were compared to the BH and BL procedures. 

The FDR level of each procedure is the fraction of rejected hypotheses that 
are truly null out of all hypotheses rejected, averaged over the 1000 simulations. 
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Table 4: The average realized FDR (SE) over 1000 simulations for various sam- 
ple sizes in each group, of a simulation study with m = 20 hypothesis, out of 
which 4 hypotheses are known with success probability 0.01, 15 hypotheses are 
null with success probability 0.1, and 1 hypothesis is non-null with success prob- 
abilities 0.1 and 0.3. The rows are the results for (1) the DBH procedure; (2) 
the BH procedure on the midP- values; (3) the DBH procedure on the p-values; 
(4) the DBL ; (5) the BL procedure on midP-values; (6) the BL procedure on 
the p- values. 



N = 25 



N = 50 



N = 75 



N=100 



N=125 



N = 150 



N = 175 



N = 200 



0.036 (0.005) 
0.008 (0.003) 
0.001 (0.001) 
0.030 (0.005) 
0.009 (0.003) 
0.001 (0.001) 



0.049 (0.005) 
0.047 (0.005) 
0.026 (0.004) 
0.033 (0.004) 
0.031 (0.004) 
0.017 (0.003) 



0.051 (0.005) 
0.043 (0.005) 
0.019 (0.003) 
0.026 (0.004) 
0.024 (0.003) 
0.011 (0.002) 



0.054 (0.006) 
0.051 (0.006) 
0.031 (0.004) 
0.034 (0.004) 
0.032 (0.004) 
0.016 (0.003) 



DBH 
idP BH 
BH 
DBL 
idP BL 
BL 



0.029 (0.004) 
0.024 (0.004) 
0.009 (0.003) 
0.022(0.004) 
0.019 (0.004) 
0.007 (0.003) 



0.056 (0.006) 
0.035 (0.006) 
0.015 (0.003) 
0.035 (0.006) 
0.021 (0.004) 
0.010 (0.003) 



0.064 (0.006) 
0.046 (0.006) 
0.024 (0.004) 
0.03S (0.006) 
0.028 (0.004) 
0.018 (0.004) 



0.037 (0.004) 
0.036 (0.004) 
0.022 (0.003) 
0.022 (0.003) 
0.022 (0.003) 
0.012 (0.003) 



The power of each procedure is the fraction of non-null hypotheses that are 
rejected out of all non-null hypotheses, averaged over the 1000 simulations. 

4.2 Results of the simulation 

Tables |4] and [5] show the resulting FDR and power of the 6 procedures, when 
applied to independent test statistics from 20 hypotheses. The 20 hypotheses 
included: one false hypothesis, with success probabilities 0.1 in one group and 
0.3 in the other group; four true null hypotheses, with success probability 0.01 in 
both groups; 15 true null hypotheses, with success probability 0.1 in both groups. 
Examination of the two tables leads to the following two conclusions, that were 
true in all simulation settings we considered. First, the discrete procedures 
had higher FDR levels, but still below the nominal 0.05 level, and were more 
powerful, than their non-discrete analogues (BH or BL). Second, the power 
advantages were larger for smaller sample sizes iV, since the data was more 
discrete for smaller N . 

Moreover, in table [5] the discrete step- up and discrete step-down procedures 
are comparable in terms of power, and the FDR level of the DBL procedure is 
lower than the FDR level of the DBH procedure. For example, for N — 75 the 
FDR of the DBH procedure is estimated to be 0.056±0.006, whereas the FDR of 
the DBL procedure is estimated to be 0.035±0.005. However, the average power 
of both procedures is estimated to be 0.71 ± 0.01. The average power of the BH 
and BL procedures is notably lower, 0.55 ± 0.01. The midP-|-Tarone adjusted 
procedures have estimated average power of 0.65±0.02. However, as the number 
of hypotheses increases (and most of the hypotheses are null) , the discrete step- 
up procedures tend to outperform the discrete step-down procedure, as we show 
next for m — 100. 

Figure [U display the results for independent test statistics in a configuration 
where 20 and 75 of the hypotheses are null with success probability 0.01 and 
0.10 respectively, and 5 of the hypotheses are non-null with success probabilities 
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Table 5: The average power (SE) over 1000 simulations for various sample sizes 
N in each group, of a simulation study with m = 20 hypothesis, out of which 4 
hypotheses are known with success probability 0.01, 15 hypotheses are null with 
success probability 0.1, and 1 hypothesis is non-null with success probabilities 
0.1 and 0.3. The rows are the results for (1) the DBH procedure; (2) the BH 
procedure on the mzdP- values; (3) the DBH procedure on the p-values; (4) 
the DBL ; (5) the BL procedure on midP-vslues; (6) the BL procedure on the 
p- values. 



N = 25 



0.246 (0.014) 
0.209 (0.013) 
0.088 (0.009) 
0.236 (0.013) 
0.211 (0.013) 
0.088 (0.009) 



N = 50 



0.454 (0.016) 
0.412 (0.016) 
0.311 (0.016) 
0.454 (0.016) 
0.410 (0.016) 
0.329 (0.015) 



N = 75 



0.711 (0.014) 
0.654 (0.015) 
0.552 (0.016) 
0.709 (0.014) 
0.652 (0.015) 
0.552 (0.016) 



N=100 



N=125 



N = 150 



N = 175 



N = 200 



0.842 (0.012) 
0.S17 (0.012) 
0.767 (0.014) 
0.S40 (0.012) 
0.812 (0.012) 
0.761 (0.014) 



0.918 (0.009) 0.970 (0.005) 0.983 (0.004) 0.991 (0.003) 



Taronc + T 



DBH 
idP BH 
BH 
DBL 
idP BL 
BL 



0.912 (0.009) 
0.849 (0.011) 
0.919 (0.009) 
0.908 (0.009) 
0.860 (0.011) 



0.963 (0.006) 
0.946 (0.007) 
0.970 (0.005) 
0.966 (0.006) 
0.946 (0.007) 



0.982 (0.004) 
0.970 (0.006) 
0.983 (0.004) 
0.982 (0.004) 
0.972 (0.006) 



0.988 (0.003) 
0.985 (0.004) 
0.991 (0.003) 
0.988 (0.003) 
0.985 (0.004) 



0.1 and 0.3. The FDR level (top row) is below the nominal 0.05 level for 
all procedures, but much closer to 0.05 for the discrete procedures over their 
non-discrete analogues. The average power is displayed in the bottom row. 
The first and second columns consider respectively, the step-up and step-down 
procedures. From examination of the first column, we see that the BH procedure 
has the lowest FDR level and the lowest average power. The DBH has the 
highest power and the midP-|-Tarone procedure is a close second. As the sample 
size increases, the gain in power from using the discrete procedures is diminished, 
and at iV = 200 all procedures have the same power. However, for fixed N the 
gap in power between the BH procedure and the discrete procedures is similar 
for m = 20 and for m = 100. From examination of the second column, we see 
that the correspondence between the discrete procedures and the BL procedures 
are very similar to those found for the BH procedure. The FDR level of the BL 
procedure is very low even for moderate sample sizes, and at iV = 200 there 
is still a gap between the discrete procedures and the BL procedure. Finally, 
looking at the power across the two columns, we see that the step-up procedures 
are more powerful than the step-down procedures. 

Table ini shows the average power of the procedures considered for a different 
fraction of true null hypotheses: either 80% or 95% of the hypotheses are null. 
For all procedures, the power was larger when the fraction of true null hypothesis 
was smaller. The power advantage of the procedures that adjust for discreteness 
over the BH or BL procedures was larger in the more difficult situations were the 
fraction of true null hypotheses was smaller. For example, the DBH procedure 
was 0.80/0.71 — 1.13 times larger than the BH procedure for mo = 0.95, but 
only 0.91/0.86 = 1.06 times larger than the BH for toq = 0.80. 

Incorporating dependency among the test statistics, as described in Section 
14.11 did not result in any of the procedures being anti conservative. As in the 
independence setting, the discrete procedures were more powerful (not shown). 
The BH procedure on p-values appe ars to con t rol th e FDR in more c ircum - 
stances that are not highly artificial Yekutieli ( 20081 ). ( Romano et al. 1, l2008l) . 
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step-up procedures 



Step-down procedures 





Figure 1: The FDR level (top row) and average power (bottom row) vs. sample 
size, for the 3 step-up procedures (first column) and the 3 step-down procedures 
(second column). The number of hypotheses is m = 100. 5 hypotheses are non- 
null with success probabilities 0.1 and 0.3. The success probabilities of the null 
hypotheses were 0.01 for 20 hypotheses, and 0.10 for 75 hypotheses. 
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Table 6: The average power (SE) over 1000 simulations of a simulation study 
with m = 100 hypotheses, N = 100 subjects in each group, for which the null 
hypotheses have success probabilities 0.1, and the non-null hypotheses have 
success probabilities 0.1 and 0.3. The columns vary in the number of true null 
hypotheses: 80 null hypotheses in the first column, and 95 null hypotheses in 
the second column. The rows are the results for (1) the DBH procedure; (2) the 
BH procedure on the midP- values; (3) the DBH procedure on the p- values; (4) 
the DBL ; (5) the BL procedure on midP-values; (6) the BL procedure on the 
p- values. 



Procedure mo — 80 mo — 95 



DBH 


0.909 


(0.002) 


0.800 


(0.006) 


Tarone+midP adjusted BH 


0.895 


(0.002) 


0.772 


(0.007) 


BH 


0.861 


(0.003) 


0.713 


(0.007) 


DBL 


0.675 


(0.003) 


0.674 


(0.007) 


Tarone+midP adjusted BL 


0.644 


(0.003) 


0.616 


(0.007) 


BL 


0.586 


(0.004) 


0.559 


(0.007) 



This robustness property appears in our simulations to be carried over to the 
discrete analogues of the BH procedure. Similarly, the step-down procedures 
were robust to deviations from independence, as considered in our simulations. 



5 Summary 

We demonstrated that the FDR level may be much lower than the nominal level 
q when applying the BH or BL procedures at level q on discrete test statistics. 
By adjusting for discreteness, it was possible to achieve tighter control of the 
FDR and higher power. 

In the simulations considered, the DBH and DBL procedures were more 
powerful than the Tarone-t-mzdP adjusted BH and BL procedures respectively. 
However, there are important situations where the Tarone-|-mic?P adjusted pro- 
cedures are more powerful. Specifically, when all the null hypotheses are identi- 
cal, the DBH and DBL procedures are identical to the BH and BL procedures 
respectively, yet the Tarone+ mtdP adjustment are more powerful. Just as 



Westfall and Wolfingeij (|1997I ) have it for the discrete Bonferroni, the gain in 
power of the DBH and DBL procedures comes from the fact that the tests are 
not identically distributed under the null. 
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A Proof of Proposition 12.11 

Proof. Let axii) and bx{i) be the atoms (were an atom is a value the discrete p- 
value can receive) of the ith discrete p- value just below and just above x G (0, 1) 
for null distribution i, and let /[•] be the indicator function. Then 

Pr{midPi < kq/m) = 

akq/mii) + [bkq/m{i) " akq/m{i)]I[ '"'^"'^ '"'^"'^ ^ < kq/m] (1) 

(i) 

Let denote the event that exactly fc — 1 hypotheses are rejected by the BH 
procedure on midP- values along with true null hypothesis Hi. The FDR level 
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of this procedure may be expressed as follows 

midFDR = -^PrimidP, < kq/m)Pr{d^^) 

iel„ k=l 

ie/o fc=i 

= or^aFDR + ^ £ i[6.,/,„(^) - a,,/™(z)]/[^^^^^^^^^l±^^^ < fcg/m]Pr(C«) 

(2) 

where origFDR = Y.reio T.k=i iakq/rn{i)Pr{Cl'^) is the FDR level of the BH 
procedure on the p-values. Since hf^q/^ — aj^q/jy^ > 0, it follows that midFDR > 
origFDR. it remains to show that midFDR < — origFDR. We will use 

the following simple lemma: 

Lemma A.l. // {bx{i) + ax{i))/2 < x, then bx{i) — ax{i) < 2x — 2ax{i) 

Proof. The result is immediate form {bx{i) + ax{i))/'2 < x <^=> bx{i) < 2x — 
ax{i). □ 

Applying the lemma, it follows from equation ([2]) that midFDR < origFDR+ 
2^q - 2origFDR, so the resuU follows. □ 

B Proof of Propostion 12.21 

Since = niax^ °'^./''^('H[^.,/,„(»)-a.,/„J^.)U['''°^^'"'''r'°^/'"''' <k,/^] ^ ^^^^ . 

midFDR =Y.ll \{ak,/m{^ + [bk,/m{i) - aWrnCO]^! ^"'^"^'^^""'^'"^'^ < fcq/m])Pr(C«) 

ielo k=l 

E^lkq akq/,n{i) + [bkq/m{i) - Qfcg/m(»)]-^[— 2°""^"'^''' ^ kq/m] r^y 
^ k m kq/m ^ 

ie/o fc=i ^' 



ig/o fc=l 

= ^E^^ (3) 



m 



C Proof of Proposition 12.41 

Proof. First, if mo = then FDR=0 since there are no false rejections. Second, 
if mo — then the only cut-off of interest is ^i= max{z : i—YTiLi{^~P'''Hi{Pi < 
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z)) ^ q} 7 therefore 

m 

FDR = FWER = Pr{P(^i) <6i) = 1 - ^{{l - PrH,{P^ <5i)) < q 



where the last inequahty follows from the definition of J i . 

It remains to prove the proposition for < mp < m. Let mi = m — mp > 0, 
and let Ii and Iq be the index sets of mi and mo p- values corresponding to the 
false null and true null hypotheses respectively. Let P' = {P{, . . . jP^J be the 
p-values corresponding to false null hypotheses. 

We will show that E{Q\P') < q, from which the proposition clearly fol- 
lows. Let P|.'j^^ < . • . < P(,„) be the sorted p-values corresponding to false null 

hypotheses. Let S G {0,...,mi} be the largest integer i satisfying P^'-^^ <Si 

, • ■ • , P(,) <dz, where 5 = if P(\) >Si. Note that R > S+V since di< ■ ■ ■ <Sm- 
Now we have 

EiQ\P') = E{^I[V > 0]|P') < E{^^I[V > 0]|P') < s^/riV > 0|P') 

(4) 

For an index set K of null hypotheses, define the constant 
Cg^k = max{z : 1 - n (1 - P^Hi {Pi < z)) < q- 

One can easily verify that for K C K' we have Cq j( > Cq^x'- 

Let r[, . . . ,r'g be the indices in the vector of p- values of the S smallest p- 
values corresponding to false null hypotheses. Then 

PriV > 0|P') = Pr(minP, < c,^,{i,...,„}/|,j,....,5^}) (5) 

Since {r'l, . . . , r'g} C h then /o C {1, . . . , m}/{ri, . . . , r^}. Therefore, Cq_m_ , 
Cq—HL^jg. If follows that 

771 

PriV > 0|P') < Pr(minP, < c, - ,/J < q (6) 



Combining equations (jj]) and (jH) we have 



i^(QIP') < < . (7) 

b + mo 771 — S 



where the last inequality follows from the fact that mo + S < tu. 



□ 
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